Machine Learning Model for the Planetary Albedo

The goal of the project is to use machine learning techniques to identify relationships between planetary mapped datasets, with the goal of providing deeper understanding of planetary surfaces and to have predictive power for planetary surfaces with incomplete datasets.

Mentors

  • Patrick Peplowski (JHUAPL)
  • Sergei Gleyzer (University of Alabama)
  • Jason Terry (University of Georgia

Task 1. Predicting the Lunar Albedo of Moon based on Chemical Composition

Using the data found in ML4SCI_GSoC/Messenger/Moon/, select a subset of the Moons surface to train your model to identify relationship between albedo and composition. Then, make a prediction about the albedo of the untrained portion of the map using just the chemical data.Compare your albedo prediction to the albedo map. How did your algorithm perform? Choose a metric to quantify your performance.

The albedo map, LPFe (iron map), LPK (potassium map), LPTh (thorium map), and LPTi (titanium) map should be used for this study. The maps are csv files with data that represents the element concentration at each location. Make sure you can reproduce the maps above to verify you are reading the data correctly.

Data Loading

In [1]:
In [2]:
In [3]:
Out[3]:
0 1 2 3 4 5 6 7 8 9 ... 710 711 712 713 714 715 716 717 718 719
0 0.331936 0.332611 0.332240 0.331028 0.331094 0.332614 0.331964 0.329994 0.327853 0.326532 ... 0.330389 0.329089 0.330334 0.333719 0.334709 0.334640 0.332491 0.334664 0.332983 0.331635
1 0.338990 0.340417 0.334623 0.333716 0.331404 0.331733 0.335648 0.335849 0.333166 0.332413 ... 0.350386 0.346509 0.341890 0.345887 0.345619 0.344203 0.345772 0.341238 0.342606 0.338984
2 0.324930 0.325832 0.328177 0.325871 0.321231 0.321791 0.322595 0.325254 0.329132 0.325335 ... 0.329577 0.332204 0.330471 0.330105 0.331836 0.335386 0.335075 0.333190 0.327436 0.330122

3 rows × 720 columns

In [4]:
Out[4]:
(360, 720)
In [5]:
In [6]:
In [7]:
In [8]:
In [9]:
In [10]:

Data Wrangling

All the Lunar Prospector of chemical data have a same centralised where region where the concentration is high and infact the lunar albedo of moon also shows high Intensity.

In [11]:
In [12]:
In [13]:
In [14]:

I choose Thorium as to make prediction about the lunar albedo as these two maps are highly co related and with small dataset thorium will give better results then any other chemical.

Data Preparation

In [15]:
0
0
In [16]:
11.644
0.003663
In [17]:
0.50656
0.0968975

i used Scaling to Normalize the Between [0,1]

In [18]:
Current max value:  1.0
Current min value:  0.0
Current max value:  1.0
Current min value:  0.0

Applying Machine Learning Models

Approach 1 : Directly Applying Machine Learning Model

In this Approach i used the Scaled dataset to make prediction and i have used 8 Models for this:

  • Ridge Regressor
  • DecisionTreeRegressor
  • KNeighborsRegressor
  • Lasso Regressor
  • ElasticNet Regressor
  • ExtraTreesRegressor
  • RandomForestRegressor
  • AdaBoostRegressor
In [19]:
In [20]:
In [21]:
The r-squared score of the model is  0.8206784847550181
The mean squared error is 0.005892834710396798
The mean absolute error is 0.0550506665220026
Out[21]:
array([[0.50514422, 0.49458489, 0.51552166, ..., 0.4424673 , 0.50928775,
        0.47496572],
       [0.67346122, 0.69105627, 0.67280473, ..., 0.73127988, 0.72651785,
        0.71465887],
       [0.3519475 , 0.212916  , 0.37955839, ..., 0.37336505, 0.4446236 ,
        0.34238651],
       ...,
       [0.50514422, 0.49458489, 0.51552166, ..., 0.4424673 , 0.50928775,
        0.47496572],
       [0.73816847, 0.75714566, 0.75506912, ..., 0.74526562, 0.76771687,
        0.76147756],
       [0.26512149, 0.07617013, 0.14320378, ..., 0.27317093, 0.39069052,
        0.232629  ]])
In [22]:
The r-squared score of the model is  0.8111417388195891
The mean squared error is 0.006176943623503447
The mean absolute error is 0.05572576802012568
Out[22]:
array([[0.49293589, 0.49085698, 0.51322212, ..., 0.43101277, 0.4897868 ,
        0.46018553],
       [0.65998981, 0.68397072, 0.64543671, ..., 0.74328766, 0.73154141,
        0.71282355],
       [0.35941868, 0.19430207, 0.4069136 , ..., 0.39243046, 0.46376305,
        0.33759359],
       ...,
       [0.49293589, 0.49085698, 0.51322212, ..., 0.43101277, 0.4897868 ,
        0.46018553],
       [0.83798652, 0.87833988, 0.87610627, ..., 0.83058375, 0.84272248,
        0.8584975 ],
       [0.26154054, 0.07004162, 0.13962887, ..., 0.2775469 , 0.39307847,
        0.23092252]])
In [23]:
The r-squared score of the model is  0.7759683695406576
The mean squared error is 0.007370285815624743
The mean absolute error is 0.062097094384140475
Out[23]:
array([[0.49123676, 0.4332101 , 0.46938492, ..., 0.41628508, 0.47631353,
        0.48614768],
       [0.65063671, 0.67222722, 0.65974451, ..., 0.71258511, 0.7280384 ,
        0.70132047],
       [0.38975274, 0.26212056, 0.38356382, ..., 0.33524167, 0.41172332,
        0.36029093],
       ...,
       [0.49123676, 0.4332101 , 0.46938492, ..., 0.41628508, 0.47631353,
        0.48614768],
       [0.83996747, 0.8759814 , 0.8521089 , ..., 0.83840768, 0.84525995,
        0.86281636],
       [0.28226291, 0.09766403, 0.17188582, ..., 0.24895516, 0.36987698,
        0.24304934]])
In [24]:
The r-squared score of the model is  -0.01510735332710171
The mean squared error is 0.04502023898241168
The mean absolute error is 0.16810017452922182
Out[24]:
array([[0.60033034, 0.58656015, 0.59846991, ..., 0.60307361, 0.64124228,
        0.6140602 ],
       [0.60033034, 0.58656015, 0.59846991, ..., 0.60307361, 0.64124228,
        0.6140602 ],
       [0.60033034, 0.58656015, 0.59846991, ..., 0.60307361, 0.64124228,
        0.6140602 ],
       ...,
       [0.60033034, 0.58656015, 0.59846991, ..., 0.60307361, 0.64124228,
        0.6140602 ],
       [0.60033034, 0.58656015, 0.59846991, ..., 0.60307361, 0.64124228,
        0.6140602 ],
       [0.60033034, 0.58656015, 0.59846991, ..., 0.60307361, 0.64124228,
        0.6140602 ]])
In [25]:
The r-squared score of the model is  -0.015107353327101706
The mean squared error is 0.04502023898241168
The mean absolute error is 0.1681001745292218
Out[25]:
array([[0.60033034, 0.58656015, 0.59846991, ..., 0.60307361, 0.64124228,
        0.6140602 ],
       [0.60033034, 0.58656015, 0.59846991, ..., 0.60307361, 0.64124228,
        0.6140602 ],
       [0.60033034, 0.58656015, 0.59846991, ..., 0.60307361, 0.64124228,
        0.6140602 ],
       ...,
       [0.60033034, 0.58656015, 0.59846991, ..., 0.60307361, 0.64124228,
        0.6140602 ],
       [0.60033034, 0.58656015, 0.59846991, ..., 0.60307361, 0.64124228,
        0.6140602 ],
       [0.60033034, 0.58656015, 0.59846991, ..., 0.60307361, 0.64124228,
        0.6140602 ]])
In [26]:
The r-squared score of the model is  0.8111417388195892
The mean squared error is 0.0061769436235034475
The mean absolute error is 0.05572576802012568
Out[26]:
array([[0.49293589, 0.49085698, 0.51322212, ..., 0.43101277, 0.4897868 ,
        0.46018553],
       [0.65998981, 0.68397072, 0.64543671, ..., 0.74328766, 0.73154141,
        0.71282355],
       [0.35941868, 0.19430207, 0.4069136 , ..., 0.39243046, 0.46376305,
        0.33759359],
       ...,
       [0.49293589, 0.49085698, 0.51322212, ..., 0.43101277, 0.4897868 ,
        0.46018553],
       [0.83798652, 0.87833988, 0.87610627, ..., 0.83058375, 0.84272248,
        0.8584975 ],
       [0.26154054, 0.07004162, 0.13962887, ..., 0.2775469 , 0.39307847,
        0.23092252]])
In [27]:
The r-squared score of the model is  0.8283231480670221
The mean squared error is 0.005661856441803238
The mean absolute error is 0.05420717615982611
Out[27]:
array([[0.48381448, 0.44733828, 0.47259357, ..., 0.42026473, 0.48012425,
        0.45738394],
       [0.66910607, 0.68931802, 0.65237249, ..., 0.75034497, 0.74046857,
        0.72345587],
       [0.38479645, 0.2397086 , 0.39828887, ..., 0.38179699, 0.46610193,
        0.35895904],
       ...,
       [0.48381448, 0.44733828, 0.47259357, ..., 0.42026473, 0.48012425,
        0.45738394],
       [0.81189777, 0.84406186, 0.83747186, ..., 0.81625644, 0.82720313,
        0.83460204],
       [0.27022579, 0.08986812, 0.15789367, ..., 0.27539635, 0.38927052,
        0.23921427]])
In [28]:
The r-squared score of the model is  0.7803242052729248
The mean squared error is 0.007257742399143578
The mean absolute error is 0.06372099860263339
Out[28]:
array([[0.49853793, 0.44985924, 0.47627312, ..., 0.4084617 , 0.47467547,
        0.42883625],
       [0.67316662, 0.71634502, 0.7012054 , ..., 0.69610995, 0.74307074,
        0.71243783],
       [0.38563909, 0.27296238, 0.38878847, ..., 0.4084617 , 0.48574394,
        0.3713055 ],
       ...,
       [0.49853793, 0.44985924, 0.47627312, ..., 0.4084617 , 0.47467547,
        0.42883625],
       [0.76404429, 0.81174405, 0.71126433, ..., 0.71606915, 0.74307074,
        0.86763385],
       [0.30229231, 0.10686066, 0.15479463, ..., 0.30413334, 0.39682606,
        0.29261371]])

Best 3 Models

The 3 best models out of 8:

  • RandomForestRegressor r2score:0.8241
  • Ridge Regressor r2score:0.8206
  • DecisionTreeRegressor r2score:0.8111
In [29]:
Random Forest Regressor
The r-squared score of the model is  0.8238482838633492
The mean squared error is 0.005822374493894232
The mean absolute error is 0.05491695938369157
In [30]:
Ridge Regressor
The r-squared score of the model is  0.8206784847550181
The mean squared error is 0.005892834710396798
The mean absolute error is 0.0550506665220026
In [31]:
Decision Tree Regressor
The r-squared score of the model is  0.8111417388195891
The mean squared error is 0.006176943623503447
The mean absolute error is 0.05572576802012568

As we have scaled the data that's why are original data has changed.These three Regressor have almost same output.

Approach-2 : Applying Machine Learning After Transpose

As the features are not actually features these are length so i transpose the data while training the Machine Learning and While creating maps i will transpose them again.

  • Not doing the Scaling this time just the original dataset.
  • Using the Previous Best three model
In [32]:
In [33]:
Previous Shape:  (360, 720)
New Shape: (720, 360)
In [34]:
Previous Shape:  (360, 720)
New Shape: (720, 360)
In [35]:
In [36]:
Ridge Regressor
The r-squared score of the model is  0.7583316312546846
The mean squared error is 0.0004236019095237334
The mean absolute error is 0.015253319996480918
In [37]:
Random Forest Regressor
The r-squared score of the model is  0.8918946519734852
The mean squared error is 0.0001690164860577503
The mean absolute error is 0.009147302069384204
In [38]:
Decision Tree Regressor
The r-squared score of the model is  0.8452233079604634
The mean squared error is 0.0002504941314241791
The mean absolute error is 0.010588625085733883

Approach-3: Applying Machine Learning After Dividing the Data

In this Approach i will divide the Longitude in pieces and then combine the Overall Result.

In [39]:
In [40]:
In [41]:
(360, 50)
(360, 50)
In [42]:
In [43]:
Random Forest Regressor
The r-squared score of the model is  0.8482427941049103
The mean squared error is 0.00030380483438507586
The mean absolute error is 0.012862481367150475
In [44]:
Ridge Regressor
The r-squared score of the model is  0.6551948054956975
The mean squared error is 0.0006926786419496901
The mean absolute error is 0.020506585243798777
In [45]:
Decision Tree Regressor
The r-squared score of the model is  0.8455838441410823
The mean squared error is 0.0003101122951446502
The mean absolute error is 0.012948406851851852

There is no such large Effect on performance (r2-score) so i am not going forward in this Approach.

Predicting the Result

Random forest Regressor found to best and the best resulting image is when we transpose the give image to the machine Learning then retranspose it to get the required result.The best model has an r2 score of 0.8912.

In [46]:

Using Thorium

In [47]:
Previous Shape:  (360, 720)
New Shape: (720, 360)
Previous Shape:  (360, 720)
New Shape: (720, 360)
The r-squared score of the model is  0.8918946519734852
The mean squared error is 0.0001690164860577503
The mean absolute error is 0.009147302069384204
In [48]:

Using Titanium

In [49]:
Previous Shape:  (360, 720)
New Shape: (720, 360)
Previous Shape:  (360, 720)
New Shape: (720, 360)
The r-squared score of the model is  0.875955198251739
The mean squared error is 0.00019933983262024866
The mean absolute error is 0.009857598790144915
In [50]:

Using Iron

In [51]:
Previous Shape:  (360, 720)
New Shape: (720, 360)
Previous Shape:  (360, 720)
New Shape: (720, 360)
The r-squared score of the model is  0.8886840725905388
The mean squared error is 0.00017330046262062097
The mean absolute error is 0.009289001800626038
In [52]:

Using Potassium

In [53]:
Previous Shape:  (360, 720)
New Shape: (720, 360)
Previous Shape:  (360, 720)
New Shape: (720, 360)
The r-squared score of the model is  0.8885427605594604
The mean squared error is 0.00017571003608161408
The mean absolute error is 0.009305479964999627
In [54]:

Overall Result of Task-1

  • The Random Forest Algorithm perform pretty well for this task.I am getting Average r2 score of 0.8862 and Average mean squared error of 0.000179.
  • There are some in noise in the prediction will be removed if we train our Machine Learning Algorithm on Larger dataset.

TASK2

The MESSENGER spacecraft mapped Mercury’s surface from 2011 to 2015, including making full-surface albedo maps and partial element maps.For Mercury the albedo map is split into the top and bottom of the planet (mercury-albedo-top-half.png.csv and mercury-albedo_resized_botton-half.png.csv).Train your model on the top half. Training should attempt to identify relationships between albedo and chemistry. Chemical maps are:

  • alsimap_smooth (Al to Si element ratio),
  • casimap_smooth (Ca to Si element ratio),
  • fesimap_smooth (Fe to Si element ratio),
  • mgsimap_smooth (Mg to Si element ratio),
  • ssimap_smooth (S to Si element ratio),

Then, make a prediction about chemical composition for the bottom half of the planet using the albedo Compare your albedo prediction to the albedo map. How did your algorithm perform? Choose a metric to quantify your performance.

Data Loading

In [55]:
In [56]:
In [57]:
In [58]:
In [59]:
In [60]:
In [61]:

Data Wrangling

Unlike the moon albedo where we are able to see the co-releation betwwen Lunar Albedo and Chemical maps there are no direct co-relation in these maps but casimap_smooth (Ca to Si element ratio), fesimap_smooth (Fe to Si element ratio),ssimap_smooth (S to Si element ratio) are very much releated to one other.

Applying Machine Learing and Predicting the Result

In this task i am directly using the Transpose Approach with KNeighborsRegressor which perform pretty well on this data compare to others.

In [62]:

Using alsimap_smooth (Al to Si element ratio)

In [63]:
Previous Shape:  (720, 1440)
New Shape: (1440, 720)
Previous Shape:  (720, 1440)
New Shape: (1440, 720)

Using casimap_smooth (Ca to Si element ratio)

In [64]:
Previous Shape:  (720, 1440)
New Shape: (1440, 720)

Using fesimap_smooth (Fe to Si element ratio)

In [65]:
Previous Shape:  (720, 1440)
New Shape: (1440, 720)

Using mgsimap_smooth (Mg to Si element ratio)

In [66]:
Previous Shape:  (720, 1440)
New Shape: (1440, 720)

Using ssimap_smooth (S to Si element ratio)

In [67]:
Previous Shape:  (720, 1440)
New Shape: (1440, 720)

Overall Result of Task-2

Maps Produce in this Task are not very great but it will improve by using larger dataset.Perfomance Measurement for Task-2 can't be done because i have used chemical map,albedo top for training as given in the first part.So i don't have anything to compare with.If the bottom chemical composition is available then only i can measure the performance.

Problem Faced

Dataset was available in Limited Quantity so the Machine Learning Algorithms are not able to predict better.

I have some difficulty in Understanding the tasks and especially in task-2 i don't have true images to compare performance.

There is still room for improvement like using K-Fold,Hyper Parameter Tuning and Ensemble Regressor but these techniques takes a lot of time in training time.Neural Networks can also be used for this but i need larger dataset for that.

In [ ]: